---
title: "Benefit-Seekers or Principle-Holders? Experimental Evidence on Americans' Democratic Trade-offs - Analysis Pipeline README"
author: "Chloe Mortenson & Dr. Erik Nisbet"
date: "`r Sys.Date()`"
output: 
  html_document:
    toc: true
    toc_depth: 3
    toc_float: true
    theme: cosmo
    highlight: tango
    code_folding: show
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval = FALSE, warning = FALSE, message = FALSE)
```

# Project Overview

This README documents the complete analysis pipeline for the **Benefit-Seekers or Principle-Holders? Experimental Evidence on Americans' Democratic Trade-offs** study, published in **Perspectives on Politics**. The project examines how citizens understand and value different dimensions of democracy using a conjoint experimental design combined with Thurstone Case V scaling.

## Study Components

1. **Thurstone Case V Scaling**: Measures individual understanding across three democracy dimensions
   - Distributive Democracy
   - Procedural Democracy
   - Liberal Democracy

2. **Conjoint Experiment**: Evaluates preferences for democratic attributes
   - Economic Well-being
   - Freedom of Expression
   - Rule of Law
   - Political Equality

3. **Advanced Statistical Modeling**:
   - AMCE (Average Marginal Component Effects)
   - CJBART (Conjoint Analysis Bayesian Additive Regression Trees)
   - IMCE (Individual Marginal Component Effects)
   - VIMP (Variable Importance)

---

# File Structure

## Input Files
- **Raw Data**: Qualtrics survey CSV file containing:
  - Respondent demographics (age, education, ideology, race, political interest)
  - Democracy understanding items (`understanding_dem_*`)
  - Attention check items (`attent_*`)
  - Conjoint choice responses (`C1`-`C5`)
  - Feature attributes (`feature*_CBCONJOINT`)

## Analysis Scripts

### 1. Analysis_Code_PP2025.Rmd
Main data cleaning and statistical analysis script that:
- Cleans and prepares conjoint data
- Performs Thurstone Case V scaling
- Runs AMCE analysis
- Fits CJBART models
- Calculates IMCE and VIMP

### 2. Figure_Code_PPfinal.Rmd
Visualization script that generates publication-ready figures for:
- Thurstone scaling results
- AMCE plots
- Interaction graphs
- VIMP heatmaps
- IMCE density plots
- Individual-level relationship plots

## Output Files

The analysis pipeline creates the following RDS files:

| File Name | Description |
|-----------|-------------|
| `final_fixed_data.rds` | Cleaned and restructured conjoint data ready for modeling |
| `final_fixed_amce.rds` | Average Marginal Component Effects results |
| `final_fixed_cjbart.rds` | CJBART model object (500 trees, 10,000 post-burn-in draws) |
| `final_fixed_imce.rds` | Individual Marginal Component Effects for each respondent |
| `final_vimp_results.rds` | Variable importance scores showing heterogeneity predictors |

---

# Analysis Workflow

## STEP 1: Data Preparation and Thurstone Scaling

### 1.1 Load and Clean Data

```{r data-loading}
# Load required libraries
library(tidyr)
library(dplyr)
library(stringr)
library(irr)       # For Kendall's W
library(psych)     # Additional statistics
library(survival)  # Conditional logit

# Read raw survey data
dt <- read.csv("path/to/your/survey_data.csv", header = TRUE)

# Remove first two rows (Qualtrics metadata)
dt <- dt %>% slice(-1, -2)
```

### 1.2 Define Variables

```{r variable-definitions}
# Democracy understanding dimensions
distributive_items <- c("understanding_dem_4", "understanding_dem_5", "understanding_dem_6")
procedural_items <- c("understanding_dem_7", "understanding_dem_8", "understanding_dem_9")
liberal_items <- c("understanding_dem_11", "understanding_dem_12", "understanding_dem_13")

# Demographic controls
z_vars <- c("age", "educ", "idpol", "idsoc", "eideo", "race")

# Conjoint variables
choice_cols <- c("C1", "C2", "C3", "C4", "C5")
feature_cols <- grep("^feature\\d+\\.\\d+\\.\\d+_CBCONJOINT$", names(dt), value = TRUE)
```

### 1.3 Create Composite Variables

```{r composite-variables}
# Political interest (mean of attention items)
dt <- dt %>%
  mutate(polint = rowMeans(dt[, attent_cols], na.rm = TRUE))

# Binary race indicator (White = 1, Other = 0)
dt <- dt %>%
  mutate(race_binary = ifelse(race == 1, 1, 0))

# Overall ideology (mean of political, social, economic)
dt <- dt %>%
  mutate(ideo = rowMeans(dt[, c("idpol", "idsoc", "eideo")], na.rm = TRUE))
```

### 1.4 Run Thurstone Case V Analysis

The Thurstone analysis converts ranking data into interval-scaled values representing the "psychological distance" between democracy concepts.

```{r thurstone-analysis}
# Function performs pairwise comparisons and converts to z-scores
thurstone_case_v <- function(data) {
  # Calculate Kendall's W for agreement
  kendalls_w <- kendall(as.matrix(data), correct = TRUE)$value
  
  # Build preference matrix
  n_items <- ncol(data)
  n_subjects <- nrow(data)
  freq_matrix <- matrix(0, nrow = n_items, ncol = n_items)
  
  # Count preferences
  for (i in 1:n_subjects) {
    for (j in 1:n_items) {
      for (k in 1:n_items) {
        if (j != k && data[i, j] < data[i, k]) {
          freq_matrix[j, k] <- freq_matrix[j, k] + 1
        }
      }
    }
  }
  
  # Convert to proportions with Laplace smoothing
  prop_matrix <- (freq_matrix + 0.5) / (n_subjects + 1)
  
  # Transform to z-scores (assumes normal distribution)
  z_matrix <- qnorm(prop_matrix)
  diag(z_matrix) <- 0
  
  # Calculate scale values (mean z-score for each item)
  scale_values <- rowMeans(z_matrix, na.rm = TRUE)
  scale_values <- scale_values - mean(scale_values)  # Center at 0
  
  # Calculate standard errors
  se_values <- apply(z_matrix, 1, function(x) sd(x, na.rm = TRUE) / sqrt(n_items - 1))
  
  return(list(
    scale_values = scale_values,
    se = se_values,
    kendalls_w = kendalls_w
  ))
}

# Run for each dimension
distributive_results <- thurstone_case_v(distributive_data)
procedural_results <- thurstone_case_v(procedural_data)
liberal_results <- thurstone_case_v(liberal_data)
```

**Key Statistics**:
- **Kendall's W**: Measures inter-rater agreement (0 = no agreement, 1 = perfect agreement)
- **Scale Values**: Higher values indicate stronger endorsement of that democracy aspect
- **Standard Errors**: Quantify uncertainty in scale estimates

---

## STEP 2: Conjoint Data Restructuring

### 2.1 Reshape from Wide to Long Format

The conjoint data must be converted from "one row per respondent" to "one row per profile comparison."

```{r reshape-conjoint}
library(cregg)

# Reshape using cregg package
long_data <- reshape_conjoint(
  data = dt,
  id_cols = "ResponseId",
  choice_cols = c("C1", "C2", "C3", "C4", "C5"),
  feature_cols = feature_cols,
  profile_cols = 1:2  # Two profiles per choice task
)
```

### 2.2 Recode Attribute Levels

Create meaningful factor levels for each democratic attribute:

```{r recode-attributes}
model_data_fixed <- long_data %>%
  mutate(
    # Economic Well-being
    Econ_Wellbeing = case_when(
      Econ_Wellbeing == "Econ_Wellbeing-1" ~ "Econ_Normative",
      Econ_Wellbeing == "Econ_Wellbeing-2" ~ "Econ_Majority", 
      Econ_Wellbeing == "Econ_Wellbeing-3" ~ "Econ_Minority"
    ),
    
    # Press Freedom
    Press_Freedom = case_when(
      Press_Freedom == "Press_Freedom-1" ~ "Press_Normative",
      Press_Freedom == "Press_Freedom-2" ~ "Press_Majority",
      Press_Freedom == "Press_Freedom-3" ~ "Press_Minority"
    ),
    
    # Rule of Law
    Rule_Law = case_when(
      Rule_Law == "Rule_Law-1" ~ "Law_Normative",
      Rule_Law == "Rule_Law-2" ~ "Law_Majority",
      Rule_Law == "Rule_Law-3" ~ "Law_Minority"
    ),
    
    # Presidential Role (Political Equality)
    Presidential_Role = case_when(
      Presidential_Role == "Presidential_Role-1" ~ "Pres_Normative",
      Presidential_Role == "Presidential_Role-2" ~ "Pres_Majority",
      Presidential_Role == "Presidential_Role-3" ~ "Pres_Minority"
    )
  ) %>%
  # Convert to factors
  mutate(across(c(Econ_Wellbeing, Press_Freedom, Rule_Law, Presidential_Role), as.factor))
```

### 2.3 Merge Thurstone Scores

Add democracy understanding scores to each observation:

```{r merge-scores}
# Extract scale values
dt$Liberal_Score <- liberal_results$scale_values[dt$understanding_dem_11]
dt$Procedural_Score <- procedural_results$scale_values[dt$understanding_dem_7]
dt$Distributive_Score <- distributive_results$scale_values[dt$understanding_dem_4]

# Join with conjoint data
model_data_fixed <- model_data_fixed %>%
  left_join(dt %>% select(ResponseId, Liberal_Score, Procedural_Score, 
                         Distributive_Score, age, educ, ideo, polint, race_binary),
            by = "ResponseId")
```

---

## STEP 3: AMCE Analysis

Average Marginal Component Effects show the population-average impact of each attribute level.

```{r amce-analysis}
library(cregg)

# Calculate AMCE with clustered standard errors
fixed_amce <- cj(
  data = model_data_fixed,
  formula = selected ~ Econ_Wellbeing + Press_Freedom + Rule_Law + Presidential_Role,
  id = ~ResponseId,
  estimate = "amce"
)

# View results
print(fixed_amce)
```

**Interpretation**: 
- Positive coefficients = increased probability of profile selection
- Reference categories: "Normative" levels
- Confidence intervals that exclude zero indicate significant effects

---

## STEP 4: CJBART Modeling

CJBART uses Bayesian machine learning to capture complex non-linear relationships and heterogeneity.

### 4.1 Prepare Data for CJBART

```{r cjbart-prep}
library(cjbart)

# Create respondent-level constants (demographics/scores)
respondent_demographics <- model_data_fixed %>%
  group_by(ResponseId) %>%
  summarise(
    age_const = first(age[!is.na(age)]),
    educ_const = first(educ[!is.na(educ)]),
    ideo_const = first(ideo[!is.na(ideo)]),
    polint_const = first(polint[!is.na(polint)]),
    race_binary_const = first(race_binary[!is.na(race_binary)]),
    Liberal_Score_const = first(Liberal_Score[!is.na(Liberal_Score)]),
    Procedural_Score_const = first(Procedural_Score[!is.na(Procedural_Score)]),
    Distributive_Score_const = first(Distributive_Score[!is.na(Distributive_Score)]),
    .groups = "drop"
  )

# Join back to main data
cjbart_data <- model_data_fixed %>%
  select(ResponseId, task, profile, selected, 
         Econ_Wellbeing, Press_Freedom, Rule_Law, Presidential_Role) %>%
  left_join(respondent_demographics, by = "ResponseId") %>%
  mutate(selected = as.integer(selected))
```

### 4.2 Fit CJBART Model

```{r cjbart-fit}
# Fit model with Bayesian MCMC
fixed_cjbart_model <- cjbart(
  data = cjbart_data,
  Y = "selected",
  id = "ResponseId", 
  type = "choice",
  cores = 6,              # Parallel processing
  ntree = 500,            # Number of trees in ensemble
  nskip = 10000,          # Burn-in iterations
  ndpost = 10000,         # Post-burn-in draws
  keepevery = 10,         # Thinning interval
  keeptrainfits = TRUE,
  usequants = TRUE,
  printevery = 1000
)
```

**Model Parameters**:
- **ntree = 500**: Ensemble of 500 regression trees
- **nskip = 10000**: Discard first 10,000 iterations (burn-in)
- **ndpost = 10000**: Keep 10,000 posterior samples
- **keepevery = 10**: Use every 10th draw (reduces autocorrelation)

### 4.3 Check Convergence

```{r convergence-check}
library(coda)

# Convert sigma draws to MCMC object
sigma_mcmc <- mcmc(fixed_cjbart_model$sigma)

# Geweke diagnostic (compares first 10% vs last 50% of chain)
geweke_test <- geweke.diag(sigma_mcmc)

# Should be |Z| < 1.96 for convergence
print(geweke_test)

# Effective sample size (should be > 1000)
print(effectiveSize(sigma_mcmc))
```

---

## STEP 5: IMCE Calculation

Individual Marginal Component Effects reveal person-specific preferences, capturing heterogeneity.

```{r imce-calculation}
# Define reference levels (baselines for comparison)
ref_levels_fixed <- c(
  "Econ_Normative", "Press_Normative", 
  "Law_Normative", "Pres_Normative"
)
names(ref_levels_fixed) <- c("Econ_Wellbeing", "Press_Freedom", 
                              "Rule_Law", "Presidential_Role")

# Calculate IMCEs
fixed_imce_results <- IMCE(
  data = cjbart_data,
  model = fixed_cjbart_model,
  attribs = c("Econ_Wellbeing", "Press_Freedom", "Rule_Law", "Presidential_Role"),
  ref_levels = ref_levels_fixed,
  method = "bayes",
  alpha = 0.05,
  cores = 1
)
```

**Output Structure**:
- `fixed_imce_results$imce`: Data frame with one row per respondent
- Columns: `ResponseId`, `Econ_Majority`, `Econ_Minority`, `Press_Majority`, etc.
- Values: Individual effect sizes relative to normative reference

**Interpretation Example**:
- If respondent #123 has `Econ_Majority = 0.15`, they are 15 percentage points more likely to choose a profile with majority-focused economic policies compared to normative economic policies

---

## STEP 6: VIMP Calculation

Variable Importance identifies which covariates (demographics, democracy scores) predict heterogeneity in conjoint preferences.

```{r vimp-calculation}
# Calculate VIMP using heterogeneity test
vimp_results <- het_vimp(
  fit = fixed_cjbart_model,
  imces = fixed_imce_results,
  attribs = c("Econ_Wellbeing", "Press_Freedom", "Rule_Law", "Presidential_Role"),
  covars = c(
    "Liberal_Score_const", "Procedural_Score_const", 
    "Distributive_Score_const", "educ_const", 
    "polint_const", "ideo_const", "age_const"
  )
)

# Package results for figure script
vimp_package <- list(
  results = vimp_results,
  calculation_date = Sys.time(),
  model_info = list(
    ntree = 500,
    nskip = 10000,
    ndpost = 10000
  )
)
```

**VIMP Scores**:
- Range: 0-100%
- Higher scores = stronger predictor of individual preference heterogeneity
- Separate scores for each (Attribute × Level × Covariate) combination

---

## STEP 7: Save Results

```{r save-results}
# Create output directory
if (!dir.exists("output")) {
  dir.create("output")
}

# Save all analysis objects
saveRDS(model_data_fixed, "output/final_fixed_data.rds")
saveRDS(fixed_amce, "output/final_fixed_amce.rds")
saveRDS(fixed_cjbart_model, "output/final_fixed_cjbart.rds")
saveRDS(fixed_imce_results, "output/final_fixed_imce.rds")
saveRDS(vimp_package, "output/final_vimp_results.rds")

cat("✅ All analysis results saved successfully!\n")
```

---

# Visualization Workflow

## Prerequisites

**IMPORTANT**: Run the complete analysis script first to generate all required `.rds` files.

## Required Packages

```{r viz-packages}
library(dplyr)
library(ggplot2)
library(cregg)
library(cjbart)
library(tidyr)
library(gridExtra)
library(cowplot)
library(coda)
library(kableExtra)
```

## Load Analysis Results

```{r load-results}
# Load saved analysis objects
model_data_fixed <- readRDS("output/final_fixed_data.rds")
fixed_amce <- readRDS("output/final_fixed_amce.rds")
cjbart_model <- readRDS("output/final_fixed_cjbart.rds")
fixed_imce <- readRDS("output/final_fixed_imce.rds")
vimp_results <- readRDS("output/final_vimp_results.rds")

cat("✅ All data loaded successfully!\n")
```

## Figure Generation

### Figure 1: Convergence Diagnostics

```{r fig-diagnostics}
# Sigma trace plot
sigma_df <- data.frame(
  iteration = 1:length(cjbart_model$sigma),
  sigma = cjbart_model$sigma
)

ggplot(sigma_df, aes(x = iteration, y = sigma)) +
  geom_line(color = "steelblue") +
  labs(title = "CJBART Sigma Convergence", 
       x = "Iteration", y = "Sigma") +
  theme_minimal()

ggsave("figures/diagnostic_sigma_trace.png", width = 10, height = 6, dpi = 300)
```

### Figure 2: Marginal Means Plot

Shows average preferences for all attribute levels across all respondents.

```{r fig-marginal-means}
# Calculate marginal means
mm_plot_data <- cj(
  model_data_fixed,
  selected ~ Econ_Wellbeing + Press_Freedom + Rule_Law + Presidential_Role,
  id = ~ResponseId,
  estimate = "mm"
)

# Create plot
ggplot(mm_plot_data, aes(x = level, y = estimate)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2) +
  facet_wrap(~feature, scales = "free_x") +
  coord_flip() +
  labs(title = "Marginal Means of Democratic Attributes",
       x = "", y = "Probability of Selection") +
  theme_minimal()

ggsave("figures/fig2_all_attributes_mm.png", width = 12, height = 8, dpi = 300)
```

### Figure 3-5: Interaction Plots

Examine how one attribute moderates preferences for another.

```{r fig-interactions}
# Rule of Law × Economic Wellbeing
cj_interaction <- cj(
  model_data_fixed,
  selected ~ Rule_Law,
  id = ~ResponseId,
  estimate = "mm",
  by = ~Econ_Wellbeing
)

ggplot(cj_interaction, aes(x = level, y = estimate, color = BY)) +
  geom_point(position = position_dodge(width = 0.5), size = 3) +
  geom_errorbar(aes(ymin = lower, ymax = upper), 
                position = position_dodge(width = 0.5), width = 0.2) +
  labs(title = "Rule of Law by Economic Well-being",
       x = "Rule of Law Level", y = "Marginal Mean",
       color = "Economic Well-being") +
  theme_minimal()

ggsave("figures/fig3_rule_of_law_interaction.png", width = 10, height = 6, dpi = 300)
```

### Figure 6: IMCE Density Plots

Visualizes the distribution of individual effects across all respondents.

```{r fig-imce-densities}
# Extract IMCE data
imce_df <- fixed_imce$imce %>%
  select(ResponseId, Econ_Majority, Econ_Minority, 
         Press_Majority, Press_Minority,
         Law_Majority, Law_Minority,
         Pres_Majority, Pres_Minority) %>%
  pivot_longer(-ResponseId, names_to = "Effect", values_to = "Value")

# Create density plot
ggplot(imce_df, aes(x = Value, fill = Effect)) +
  geom_density(alpha = 0.6) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  facet_wrap(~Effect, scales = "free") +
  labs(title = "Distribution of Individual Marginal Component Effects",
       x = "Effect Size", y = "Density") +
  theme_minimal() +
  theme(legend.position = "none")

ggsave("figures/fig6_imce_densities.png", width = 12, height = 10, dpi = 300)
```

### Figure 7: VIMP Heatmap

Shows which covariates predict heterogeneity in preferences for each attribute level.

```{r fig-vimp-heatmap}
# Process VIMP data
vimp_df <- vimp_results$results$results %>%
  mutate(
    importance = pmin(importance * 100, 100),
    covar_label = case_when(
      covar == "Liberal_Score_const" ~ "Liberal Democracy",
      covar == "Procedural_Score_const" ~ "Procedural Democracy",
      covar == "Distributive_Score_const" ~ "Distributive Democracy",
      covar == "educ_const" ~ "Education",
      covar == "polint_const" ~ "Political Interest",
      covar == "ideo_const" ~ "Ideology",
      covar == "age_const" ~ "Age"
    ),
    Level_Clean = case_when(
      grepl("Majority", Level) ~ "Majority",
      grepl("Minority", Level) ~ "Minority"
    ),
    Attribute_Display = case_when(
      Attribute == "Presidential_Role" ~ "Political Equality",
      Attribute == "Press_Freedom" ~ "Freedom of Expression",
      Attribute == "Rule_Law" ~ "Rule of Law",
      Attribute == "Econ_Wellbeing" ~ "Economic Well-being"
    )
  ) %>%
  filter(Level_Clean %in% c("Majority", "Minority"))

# Create heatmap
ggplot(vimp_df, aes(x = covar_label, y = Level_Clean, fill = importance)) +
  facet_grid(Attribute_Display ~ ., scales = "free") +
  geom_tile(color = "white") +
  geom_text(aes(label = sprintf("%.1f", importance)), 
            color = ifelse(vimp_df$importance > 50, "white", "black"),
            size = 4) +
  scale_fill_gradient(low = "white", high = "firebrick2", 
                     limits = c(0, 100),
                     name = "Importance (%)") +
  labs(title = "Variable Importance in Predicting Individual Preferences",
       x = "Democracy Understanding and Individual Attributes",
       y = "Attribute Level") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggsave("figures/fig7_vimp_heatmap.png", width = 12, height = 8, dpi = 300)
```

### Figures 8-9: Individual-Level Relationship Plots

Shows how individual effects vary by democracy understanding scores.

```{r fig-individual-relationships}
# Function to create quartile comparison plots
create_quartile_graph <- function(imce_data, outcome_var, covariate_var, 
                                  covariate_label, outcome_label) {
  
  plot_data <- data.frame(
    ResponseId = imce_data$imce$ResponseId,
    outcome = imce_data$imce[[outcome_var]],
    covariate = imce_data$imce[[covariate_var]]
  ) %>% 
    filter(!is.na(covariate) & !is.na(outcome)) %>%
    arrange(outcome) %>%
    mutate(effect_order = 1:n())
  
  # Assign quartiles
  quartiles <- quantile(plot_data$covariate, probs = c(0, 0.25, 0.5, 0.75, 1))
  plot_data$quartile <- cut(plot_data$covariate, breaks = quartiles, 
                            labels = c("Q1 (Lowest)", "Q2", "Q3", "Q4 (Highest)"),
                            include.lowest = TRUE)
  
  # Effect line plot
  effect_line <- ggplot(plot_data, aes(x = effect_order, y = outcome)) +
    geom_line(color = "black") +
    geom_hline(yintercept = 0, linetype = "dashed") +
    labs(title = paste("Individual Effects:", outcome_label),
         x = "", y = outcome_label) +
    theme_minimal()
  
  # Quartile histogram
  quartile_hist <- ggplot(plot_data, aes(x = effect_order, fill = quartile)) +
    geom_histogram(bins = 100, position = "stack") +
    labs(x = "Individual Respondents", y = "Count",
         fill = paste(covariate_label, "Quartiles")) +
    scale_fill_manual(values = c("#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3")) +
    theme_minimal() +
    theme(legend.position = "bottom")
  
  # Combine plots
  combined_plot <- plot_grid(effect_line, quartile_hist, 
                             ncol = 1, rel_heights = c(0.6, 0.4))
  
  return(combined_plot)
}

# Example: Liberal Democracy × Economic Wellbeing (Majority)
p8 <- create_quartile_graph(
  imce_data = fixed_imce,
  outcome_var = "Econ_Majority",
  covariate_var = "Liberal_Score_const",
  covariate_label = "Liberal Democracy Score",
  outcome_label = "Economic Well-being (Majority) Effect"
)

ggsave("figures/fig8_liberal_econ_majority.png", p8, 
       width = 12, height = 8, dpi = 300)
```

---

# Computational Requirements

## Software Dependencies

### R Version
- **Minimum**: R ≥ 4.0.0
- **Recommended**: R ≥ 4.3.0

### Required R Packages

```{r package-list}
# Data manipulation
install.packages("tidyr")
install.packages("dplyr")
install.packages("stringr")

# Statistics
install.packages("irr")        # Kendall's W
install.packages("psych")      # Descriptive stats
install.packages("survival")   # Conditional logit

# Conjoint analysis
install.packages("cregg")      # AMCE/MM calculations
install.packages("cjbart")     # Bayesian machine learning

# Visualization
install.packages("ggplot2")
install.packages("gridExtra")
install.packages("cowplot")
install.packages("kableExtra")

# MCMC diagnostics
install.packages("coda")
```

## Hardware Requirements

### Minimum Specifications
- **RAM**: 8 GB
- **Cores**: 4 CPU cores
- **Storage**: 2 GB free space

### Recommended for Large Datasets
- **RAM**: 16-32 GB (CJBART is memory-intensive)
- **Cores**: 6-8 CPU cores (for parallel processing)
- **Storage**: 5 GB (for model objects and figures)

## Runtime Estimates

Approximate processing times on a modern desktop (8 cores, 16 GB RAM):

| Step | Runtime |
|------|---------|
| Data cleaning & Thurstone | 2-5 minutes |
| Conjoint reshaping | 3-10 minutes |
| AMCE calculation | 1-2 minutes |
| CJBART model fitting | 30-90 minutes |
| IMCE calculation | 10-30 minutes |
| VIMP calculation | 15-45 minutes |
| Figure generation | 5-10 minutes |
| **Total Pipeline** | **~2-4 hours** |

**Note**: Runtime scales with:
- Number of respondents
- Number of MCMC iterations
- Number of trees in CJBART
- Available CPU cores

---

# Interpreting Results

## Thurstone Case V Outputs

### Scale Values
- **Interpretation**: Interval-level measurement of concept endorsement
- **Range**: Typically -2 to +2 (centered at 0)
- **Higher values** = stronger agreement/preference for that democracy aspect
- **Example**: If "economic equality" has scale value of 0.85 and "free speech" has -0.45, respondents rate economic equality substantially higher

### Kendall's W
- **Range**: 0 to 1
- **Interpretation**:
  - W < 0.3: Weak agreement
  - 0.3 ≤ W < 0.7: Moderate agreement
  - W ≥ 0.7: Strong agreement
- **Use**: Assesses reliability of ranking task

### Model Fit Statistics
- **R²**: Proportion of variance explained (higher = better)
- **RMSE**: Root mean squared error (lower = better)
- **Stress**: Kruskal's stress (< 0.1 = excellent, < 0.2 = good)

## AMCE Interpretation

### Coefficient Magnitude
- **Units**: Probability points (percentage points)
- **Example**: AMCE = 0.08 means "8 percentage point increase in selection probability"

### Statistical Significance
- Check if **95% CI excludes zero**
- p < 0.05 indicates significant difference from reference category

### Reference Category
- All effects are **relative to the normative level**
- Normative = traditional democratic theory definition

## IMCE Interpretation

### Individual Variability
- **Wide distribution** = high heterogeneity in preferences
- **Narrow distribution** = population is relatively homogeneous

### Effect Direction
- **Positive IMCE** = prefers this level over normative
- **Negative IMCE** = prefers normative over this level
- **Near zero** = indifferent

### Credible Intervals
- **Individual CIs** quantify uncertainty for each person
- Overlapping zero = effect not distinguishable from normative

## VIMP Interpretation

### Importance Scores (0-100%)
- **> 75%**: Very strong predictor of heterogeneity
- **50-75%**: Strong predictor
- **25-50%**: Moderate predictor
- **< 25%**: Weak or negligible predictor

### Reading the Heatmap
- **Rows**: Attribute levels (Majority/Minority)
- **Columns**: Covariates (democracy scores, demographics)
- **Cell color intensity**: Strength of relationship

### Key Questions VIMP Answers
1. Do democracy understanding scores predict preference heterogeneity?
2. Which demographic factors matter most?
3. Are certain attributes more influenced by individual differences?

---

# Troubleshooting

## Common Issues and Solutions

### Issue 1: CJBART Memory Error

**Error**: `Error: cannot allocate vector of size X Gb`

**Solutions**:
```{r memory-fix}
# Option 1: Reduce trees
ntree = 300  # Instead of 500

# Option 2: Reduce posterior samples
ndpost = 5000  # Instead of 10000

# Option 3: Increase thinning
keepevery = 20  # Instead of 10

# Option 4: Clear workspace before running
rm(list = ls())
gc()
```

### Issue 2: Convergence Warnings

**Warning**: `Geweke diagnostic suggests lack of convergence`

**Solutions**:
```{r convergence-fix}
# Increase burn-in
nskip = 20000  # Instead of 10000

# Run longer chains
ndpost = 15000  # Instead of 10000

# Check trace plots visually
plot(cjbart_model$sigma, type = "l")

# Calculate effective sample size
library(coda)
effectiveSize(mcmc(cjbart_model$sigma))
# Should be > 1000
```

### Issue 3: Factor Level Mismatches

**Error**: `factor X has new levels`

**Cause**: Inconsistent factor levels between datasets

**Solution**:
```{r factor-fix}
# Explicitly set factor levels BEFORE merging
model_data_fixed <- model_data_fixed %>%
  mutate(
    Econ_Wellbeing = factor(Econ_Wellbeing, 
                           levels = c("Econ_Normative", "Econ_Majority", "Econ_Minority")),
    Press_Freedom = factor(Press_Freedom,
                          levels = c("Press_Normative", "Press_Majority", "Press_Minority")),
    Rule_Law = factor(Rule_Law,
                     levels = c("Law_Normative", "Law_Majority", "Law_Minority")),
    Presidential_Role = factor(Presidential_Role,
                              levels = c("Pres_Normative", "Pres_Majority", "Pres_Minority"))
  )
```

### Issue 4: Missing Values in VIMP

**Error**: `NA/NaN/Inf in foreign function call`

**Solutions**:
```{r missing-fix}
# Check for missing values
sapply(cjbart_data, function(x) sum(is.na(x)))

# Remove rows with missing key variables
cjbart_data <- cjbart_data %>%
  filter(
    !is.na(selected) &
    !is.na(Econ_Wellbeing) &
    !is.na(Liberal_Score_const)
  )

# Check for infinite values
sapply(cjbart_data, function(x) sum(is.infinite(x)))
```

### Issue 5: Plotting Errors

**Error**: `object 'X' not found` in figure script

**Solution**:
```{r plot-fix}
# Verify all required files exist
required_files <- c(
  "final_fixed_data.rds",
  "final_fixed_amce.rds", 
  "final_fixed_cjbart.rds",
  "final_fixed_imce.rds",
  "final_vimp_results.rds"
)

missing <- required_files[!file.exists(required_files)]
if (length(missing) > 0) {
  stop(paste("Missing files:", paste(missing, collapse = ", ")))
}

# Load files in correct order
model_data_fixed <- readRDS("final_fixed_data.rds")
fixed_amce <- readRDS("final_fixed_amce.rds")
# ... etc
```

---

# Best Practices

## Data Quality Checks

### Before Analysis
```{r data-checks}
# Check sample size
cat("Total respondents:", length(unique(dt$ResponseId)), "\n")
cat("Total choice observations:", nrow(dt), "\n")

# Check completion rates
completion_rate <- mean(!is.na(dt$C1)) * 100
cat("Completion rate:", round(completion_rate, 1), "%\n")

# Check for speeders (< 5 minutes)
median_duration <- median(dt$Duration__in_seconds_, na.rm = TRUE)
cat("Median survey duration:", round(median_duration / 60, 1), "minutes\n")

# Check attention check pass rate
attention_pass_rate <- mean(dt$attention_check == "correct", na.rm = TRUE) * 100
cat("Attention check pass rate:", round(attention_pass_rate, 1), "%\n")
```

### After Modeling
```{r model-checks}
# CJBART diagnostics
cat("Effective sample size:", effectiveSize(mcmc(cjbart_model$sigma)), "\n")
cat("Geweke Z-score:", geweke.diag(mcmc(cjbart_model$sigma))$z, "\n")

# IMCE summary
cat("IMCE range:", 
    round(range(fixed_imce$imce$Econ_Majority), 3), "\n")
cat("IMCE median:", 
    round(median(fixed_imce$imce$Econ_Majority), 3), "\n")

# Check for outliers (> 3 SD from mean)
outlier_threshold <- 3
outliers <- fixed_imce$imce %>%
  mutate(z_score = scale(Econ_Majority)) %>%
  filter(abs(z_score) > outlier_threshold)

cat("Number of outliers:", nrow(outliers), "\n")
```

## Reproducibility

### Set Seed for Consistency
```{r reproducibility}
# Set seed before any random operations
set.seed(42)

# For CJBART (Bayesian MCMC)
fixed_cjbart_model <- cjbart(
  data = cjbart_data,
  Y = "selected",
  id = "ResponseId",
  type = "choice",
  # ... other parameters ...
  seed = 42  # Ensures reproducible results
)
```

### Session Info
```{r session-info}
# Always save session info for reproducibility
writeLines(capture.output(sessionInfo()), "session_info.txt")
```

### Version Control
- Use Git to track changes to analysis scripts
- Document any manual data cleaning steps
- Save intermediate datasets with date stamps

## Performance Optimization

### Parallel Processing
```{r parallel}
# Use multiple cores for CJBART
library(parallel)
num_cores <- detectCores() - 1  # Leave 1 core free

fixed_cjbart_model <- cjbart(
  data = cjbart_data,
  # ... parameters ...
  cores = num_cores
)
```

### Efficient Data Structures
```{r efficiency}
# Convert to data.table for faster operations on large datasets
library(data.table)
dt_large <- as.data.table(dt)

# Use dplyr for readable code
# Use data.table for speed on 100k+ rows
```

---

# References

## Key Papers

1. **Conjoint Analysis**:
   - Hainmueller, J., Hopkins, D. J., & Yamamoto, T. (2014). "Causal inference in conjoint analysis: Understanding multidimensional choices via stated preference experiments." *Political Analysis*, 22(1), 1-30.

2. **CJBART Method**:
   - Goplerud, M., Imai, K., & Pashley, N. (2022). "Estimating Heterogeneous Causal Effects of High-Dimensional Treatments: Application to Conjoint Analysis." *arXiv preprint*.

3. **Heterogeneity Detection in Conjoint Experiments**:
   - Robinson, T. S., & Duch, R. M. (2024). "How to Detect Heterogeneity in Conjoint Experiments." *The Journal of Politics*, 86(2), 412-427.

4. **Thurstone Scaling**:
   - Thurstone, L. L. (1927). "A law of comparative judgment." *Psychological Review*, 34(4), 273-286.

## R Package Documentation

- **cregg**: [https://cran.r-project.org/package=cregg](https://cran.r-project.org/package=cregg)
- **cjbart**: [https://github.com/maxgoplerud/cjbart](https://github.com/maxgoplerud/cjbart)
- **tidyverse**: [https://www.tidyverse.org](https://www.tidyverse.org)

---

# Contact Information

**Project Lead**: Chloe Mortenson  
**Principal Investigator**: Dr. Erik Nisbet

For questions about the analysis pipeline, please contact the research team.

---

# Appendix: Quick Start Guide

## Minimal Working Example

```{r quick-start}
# 1. Install required packages
install.packages(c("tidyr", "dplyr", "cregg", "cjbart", "ggplot2"))

# 2. Set working directory
setwd("path/to/your/project")

# 3. Run analysis script
source("Analysis_Code_PP2025.Rmd")

# 4. Wait for completion (2-4 hours)

# 5. Generate figures
source("Figure_Code_PPfinal.Rmd")

# 6. Check output folders
list.files("output")   # Should show 5 .rds files
list.files("figures")  # Should show 9+ .png files
```

## One-Line Pipeline (Advanced)

```{r pipeline}
# Run entire pipeline (after setting up data path)
system("Rscript -e \"rmarkdown::render('Analysis_Code_PP2025.Rmd'); rmarkdown::render('Figure_Code_PPfinal.Rmd')\"")
```

---

**Last Updated**: `r Sys.Date()`

**Document Version**: 1.0
